The Advantage Learning Operator

نویسنده

  • Greg Farquhar
چکیده

Value-based reinforcement learning typically involves the repeated application of an update rule, such as the Bellman operator TB, to an action-value function. Recent work has explored the use of alternative operators, which remain optimality-preserving and may result in improved performance. In this report, I study in particular the advantage learning operator, TALQ = TBQ − α(V − Q). A theoretical analysis of learning as an estimator of Q-value ordering shows that advantage learning may compensate for high Q-learning step sizes. I show further that advantage learning grants increased robustness to the presence of stochasticity, and discuss the importance of committing to current estimates of Q-value ordering, and of increasing action gaps. Finally, I propose two algorithms for on-line optimization of the advantage learning parameter α, demonstrating successful proofs-of-concept in simple MDPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Knowledge Management Capabilities and Information Technology on Innovative Performance with Mediating Role of Entrepreneurship, Learning and Competitive Advantage

The aim of this study was to determine the effect of knowledge management capabilities and information technology capabilities on innovative performance with the mediating role of organizational entrepreneurship, organizational learning and competitive advantage. The research method is descriptive-survey. The statistical population consists of all employees of Shimifar Iran Company, and 137 peo...

متن کامل

The Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS

The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...

متن کامل

The Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS

The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...

متن کامل

Two-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect

This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...

متن کامل

Absorptive Capability and Competitive Advantage: Some Insights from Indian Pharmaceutical Industry

Every firm learns through firm specific methods. This learning process is operationalized by firm’s knowledge management practices. Therefore, knowledge to result in successful learning should be assisted by a combinative framework which can enhance a firms’ absorptive capability. This in turn will play a decisive role for achieving competitive advantage. Current literature in strategic managem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016